Nearest neighbor classification from multiple feature subsets

نویسنده

  • Stephen D. Bay
چکیده

Combining multiple classiiers is an eeective technique for improving accuracy. There are many general combining algorithms, such as Bagging, Boosting, or Error Correcting Output Coding, that signiicantly improve classiiers like decision trees, rule learners, or neural networks. Unfortunately, these combining methods do not improve the nearest neighbor classiier. In this paper, we present MFS, a combining algorithm designed to improve the accuracy of the nearest neighbor (NN) classiier. MFS combines multiple NN classiiers each using only a random subset of features. The experimental results are encouraging: On 25 datasets from the UCI Repository, MFS signiicantly outperformed several standard NN variants and was competitive with boosted decision trees. In additional experiments, we show that MFS is robust to irrelevant features, and is able to reduce both bias and variance components of error.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

An Improved K-Nearest Neighbor with Crow Search Algorithm for Feature Selection in Text Documents Classification

The Internet provides easy access to a kind of library resources. However, classification of documents from a large amount of data is still an issue and demands time and energy to find certain documents. Classification of similar documents in specific classes of data can reduce the time for searching the required data, particularly text documents. This is further facilitated by using Artificial...

متن کامل

ACO Based Feature Subset Selection for Multiple k-Nearest Neighbor Classifiers

The k-nearest neighbor (k-NN) is one of the most popular algorithms used for classification in various fields of pattern recognition & data mining problems. In k-nearest neighbor classification, the result of a new instance query is classified based on the majority of k-nearest neighbors. Recently researchers have begun paying attention to combining a set of individual k-NN classifiers, each us...

متن کامل

A Classification Method for E-mail Spam Using a Hybrid Approach for Feature Selection Optimization

Spam is an unwanted email that is harmful to communications around the world. Spam leads to a growing problem in a personal email, so it would be essential to detect it. Machine learning is very useful to solve this problem as it shows good results in order to learn all the requisite patterns for classification due to its adaptive existence. Nonetheless, in spam detection, there are a large num...

متن کامل

Nearest Neighbor Ensembles Combines with Weighted Instance and Feature Sub Set Selection: A Survey

Ensemble learning deals with methods which employ multiple learners to solve a problem The generalization ability of an ensemble is usually significantly better than that of a single learner, so ensemble methods are very attractive, at the same time feature selection process of ensemble technique has important role of classifier. This paper, presents the analysis on classification technique of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Intell. Data Anal.

دوره 3  شماره 

صفحات  -

تاریخ انتشار 1999